Dynamic Processing Slots Scheduling for I/o Intensive Jobs of Hadoop on Pathology Data
نویسندگان
چکیده
The increasing use of computing resource in our daily lives leads to data generation at an astonishing rate. The computing industry is being repeatedly questioned for its ability to accommodate the unpredictable growth rate of data.It has encouraged the development. Hadoop consists of Hadoop Mapreduce and Hadoop Distributed File System (HDFS), is a platform for large scale data and processing. Distributed processing has become common as the number of data has been increasing rapidly worldwide and the scale of processes has become larger, so that Hadoop has attracted many cloud computing enterprises and technology enthusiasts. Hadoop users are expanding under this situation. My studies are to develop the faster of executing jobs Originated by Hadoop and Hive. Our Proposed work is to set dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce focusing on I/O wait during execution of the pathology data efficiently on the Hadoop cluster or clouds. Assigning more tasks to added free slots when CPU resources with the high rate of I/O wait have been detected on each active Task Tracker node leads to the improvement of 30% of CPU performance.
منابع مشابه
Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds
MapReduce has become a popular programming model for running data intensive applications on the cloud. Completion time goals or deadlines of MapReduce jobs set by users are becoming crucial in existing cloudbased data processing environments like Hadoop. There is a conflict between the scheduling MR jobs to meet deadlines and “data locality” (assigning tasks to nodes that contain their input da...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملPreemptive ReduceTask Scheduling for Fair and Fast Job Completion
Hadoop MapReduce adopts a two-phase (map and reduce) scheme to schedule tasks among data-intensive applications. However, under this scheme, Hadoop schedulers do not work effectively for both phases. We reveal that there exists a serious fairness issue among jobs of different sizes, leading to prolonged execution for small jobs, which are starving for reduce slots held by large jobs. To solve t...
متن کاملQueuing Network Models to Predict the Completion Time of the Map Phase of MapReduce Jobs
Big Data processing is generally defined as a situation when the size of the data itself becomes part of the computational problem. This has made divide-and-conquer type algorithms implemented in clusters of multi-core CPUs in Hadoop/MapReduce environments an important data processing tool for many organizations. Jobs of various kinds, which consists of a number of automatically parallelized ta...
متن کاملReliable and Locality Driven Scheduling in Hadoop
The increasing use of computing resources in our daily lives leads to data being generated at an unprecedent rate. The computing industry is being repeatedly questioned for its ability to accommodate the unpredictable growth rate of data, and its ability to process them. This has encouraged the development of cluster based data-intensive applications. Hadoop is a popular open source framework k...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014